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Abstract 


Much technology > assessment and organization design data exists in 
Microsoft Excel® spreadsheets. Tools are needed to put this data into a form that 
can be used by design managers to make design decisions. One need is to cluster 
data that is highly coupled. Tools such as the Dependency Structure Matrix (DSM) 
and a Genetic Algorithm (GA) can be of great benefit. However, no tool currently 
combines the DSM and a GA to solve the clustering problem. This paper describes a 
new software tool that interfaces a GA written as an Excel macro with a DSM in 
spreadsheet format. The results of several test cases are included to demonstrate 
how well this new tool works. 

Introduction 

In many complex projects, it is helpful to find subsets that have a minimum coupling or 
interaction with other subsets. These subsets may be organizations, activities, people, or 
processes. This grouping of subsets is sometimes called clustering, where individual clusters 
contain most, if not all, of the interactions within the cluster, and the interactions among clusters 
are reduced or eliminated. Another method is to cluster tightly coupled processes together. This 
makes it easier to visualize the influences processes might have on one another. There are many 
applications for clustering ranging from simple problems like clustering the components for a 
bicycle to make assembly easier; to medium problems such as cluster organizations that closely 
interact with each other so they may be placed in close proximity to one another; to the more 
complex such as clusreing the different aspects of a Mars program to aid in assigning personnel 
twho should work closely together. All of these activities and their couplings can easily be 
displayed in an Excel™ spreadsheet. A software tool is needed that will rearrange the row and 
columns of the spreadhseet to cluster like pieces together. 


The purpose of this study is to develop an Excel macro that couples a Genetic Algorithm 
(GA) to a Dependency Structure Matrix (DSM) to cluster tightly coupled processes around the 
diagonal of the matrix. The DSM is a tool for displaying processes and their couplings (ref. 1.). 
For an array dimension greater than 10, the manipulation of the rows and columns to cluster 
along a diagonal is tedious and the convergence process towards a solution is not obvious (which 
rows to interchange, etc.). The DSM (also referred to as a Design Structure Matrix or a 
Dependency Structure Method) is based on graph theory where processes (nodes) are placed 
along the diagonal and their couplings (arcs) are in the off-diagonal elements (ref. 2). The user 
determines the information contained in the diagonal and off-diagonal elements based on the 
problem being solved. The DSM is a very flexible tool and has been applied to a wide variety of 
projects including component-based DSM for modeling system structure based on component 
interrelationships, people-based DSM for modeling organization structure based on information 
flow among people in groups, activity-based DSM for modeling project schedule and activity 
sequencing based on interactivity information flow, and parameter-based DSM for modeling low 
level relationships between decisions and parameters (ref. 3). The DSM has also been applied to 
clustering problems (refs. 4 and 5) and cluster processes along the diagonal of the DSM (ref. 6). 
Much of the research has been done by MIT for developing management tools. 



The use of GA’s has been instrumental in achieving good solutions to discrete optimization 
problems that have not been satisfactorily addressed by other methods (ref. 7). GA’s can rapidly 
search a very complex design space. The GA has been applied to a DSM problem to find the 
optimum sequence for the processes of a complex design project based on time and cost (ref. 8). 
GA’s have also been applied to clustering problems (refs. 9 andlO) such as clustering of like 
flowers and clustering of patients with multiple sclerosis who have like symptoms. This project 
combines the DSM and a GA for clustering into a single powerful tool written as an Excel macro. 
Although GA’s and DSM’s have been applied separately to clustering problems such as 
clustering the components of an automobile climate control system,, the literature search 
indicated that this is the first time a GA has been coupled with the DSM to solve a clustering 
problem. 

There are numerous Excel spreadsheets containing DSM’s with processes that design managers 
wanted to cluster around the diagonal. Before the development of this tool, the process clustering 
for these problems was typically done by manual manipulation of the DSM. Now, by applying 
this new tool and changing a few user input parameters, the design manager can rapidly examine 
many possible combinations and optimize the clustering of the processes around the diagonal. 


The Dependency Structure Matrix 


The Dependency Structure Matrix (DSM), originally formulated by Steward, is a tool for 
displaying the sequence of processes (ref. 1). A sample DSM is shown in figure 1. 
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Figure 1 . Dependency Structure Matrix 

In this DSM, the organization numbers are located the yellow diagonal cells and their couplings 
are in the off diagonal cells of the matrix. The numbers in the off diagonal cells indicate the 
strength of the coupling between two organizations. The larger the number, the stronger the 
coupling is. The strongest couplings are seen in the red off diagonal elements. If the number in 
the off diagonal cell is 0 then there is no coupling between the two organizations. 

Sequence Optimization 

Several software tools have been developed to analyze the DSM. One of these tools is 
called DeMAID (Design Manager’s Aid for Intelligent Decomposition, ref.l 1). The most recent 
version of DeMAID is called DeMAID/GA to reflect the addition of a Genetic Algorithm (GA) to 
optimize the ordering of the processes within the iterative subcycles. The fitness function of the 
GA is based minimizing the time and/or cost of each iterative subcycle. The fitness function uses 
the time and cost of each process and a factor dependent on the coupling magnitude of each 
feedback coupling. The resulting DSM will display the processes in the most optimum sequence 
(top left to bottom right) for execution. 
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Clustering 

In a clustering problem, the DSM is typically symmetric and contains tightly coupled 
processes. Processes “a” and “b” are considered to be tightly coupled when there are off diagonal 
elements coupling process “a” to process “b” and process “b” to process “a”. The strength of the 
coupling is indicated by the number in the off diagonal element coupling the two processes, the 
larger the number the stronger the coupling. For example, in the Langley line-organization study, 
the coupling was rated a 0, 1, or 2 where 2 indicates a stronger coupling than 1 or 0. The object 
of the clustering algorithm is not to minimize feedback couplings or to optimize the execution 
sequence, but optimize the sequence by grouping the tightly coupled processes into clusters 
around the diagonal. The results can be used to determine which line-organizations should be 
located near each other. 

The Genetic Algorithm 

The use of GA’s has been instrumental in achieving good solutions to discrete 
optimization problems that have not been satisfactorily addressed by other methods (ref. 7). The 
GA searches a population of design points, coded as finite-length, finite-alphabet strings. 
Successive populations are produced primarily by the operations of selection, crossover, and 
mutation. Frequently, a binary coding is used with the GA; the values of the design variables are 
coded as binary numbers (l’s and 0’s) and then concatenated into a string. While this approach 
works well with numerical problems, it is not efficient for the clustering problem (refs. 12 andl3). 
This GA uses a direct representation of the order as a coding of an n-process system, with each 
integer 1 through n used only once. 

Selection 

The selection operator determines those members of the population that survive to 
participate in the production of members of the next population. Selection is based on the value 
of the fitness function, or the fitness of the individual members, such that members with greater 
fitness levels tend to survive. The tournament selection operator is applied to select members of 
the mating pool. To fill the mating pool, two strings are randomly selected without replacement 
from the parent pool and compared (a tournament); the one with greater fitness is included in the 
mating pool. The same member can be selected more than once and this selection process does 
not guarantee that the most fit member will be passed along to the mating pool. 

Crossover 

The crossover operator is the recombination of traits of the selected members, called the 
mating pool, in the hope of producing a child with better fitness levels than its parents. Crossover 
is accomplished by swapping parts of the string into which these design points have been coded. 
Because the strings consist of integers and not just binary l’s and 0’s, crossover is accomplished 
by position-based (ref.l 1) crossover as shown in Figure 2. Several processes (i.e. 1, 4, 5, and 6) 
are chosen from the first parent and placed in the same positions in the child string. Then, the 
processes (i.e. 2, 3, and 7) that were not taken from the first parent are taken from the second 
parent to fill the holes in the child string in the order in which they appear in the second parent. 
The result is a complete string with one and only one copy of each process number. 

Parent 1 => 1 7 2 4 3 5 6 Parent 2 =>6521374 

Randomly select process numbers 1 , 4, 6, and 7 from Parent 1 
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Child =>1 4 56 

Fill in from Parent 2 based on encountering missing process numbers 
Child => 1 23 475 6 

Figure 2. Position Based Crossover Function 


Mutation 

The mutation operator prevents the search of the space from becoming too narrow or 
getting hung up in a local minimum. After the production of a child population, this operator 
randomizes small parts of the resulting strings, with a very low probability that any given string 
position will be affected. Mutation is accomplished through the order-based (ref. 11) mutation 
operator, as shown in Figure 3. Each string position is polled; if a given string position (i.e. 
position 2) is randomly selected to undergo mutation, then its content is swapped with a randomly 
selected position (i.e. position 4) in the same string 

String before mutation => 1724356 
String after mutation => 1427356 

Figure 3. Order Based Mutation 


Fitness Function 

The previous GA operators, selection, crossover, and mutation are somewhat problem 
independent. The fitness function, however, is typically problem dependent. For the GA used for 
clustering, the GA attempts to move all of the highest valued off diagonal elements of the DSM 
as close to the diagonal as possible. For example, in the DSM in figure 1, the user defined a 
cluster parameter of 1.5, so no off diagonal element with a value less than 1.5 will be considered 
in computing the fitness function. The off diagonal elements that will be part of the calculation of 
the fitness function are in red. The user also defines the value for penalizing the fitness function, 
for example 10. The fitness function routine loops through all cells in the matrix and for each off 
diagonal cell that has a value greater than the cluster parameter (1.5 in figure 1) a fitness value is 
calculated. This calculation multiplies the value of the off diagonal element by the absolute value 
of the difference between the column of the element and the diagonal. Then this product is 
multiplied by the penalty value so that the farther away an element is from the diagonal then the 
larger the fitness function. For example, suppose that an off diagonal element with a value of 2 is 
in row 2 column 10 and the cluster penalty is 10. The fitness for that element would be 
2*(abs(10-2)*10 or 160. Flowever, if the GA moved that element to row 4, then the fitness would 
be reduced to 2*abs(4-2)*10 or 40.A11 of these penalties are summed to create the fitness value. 
Thus the GA tends to move the highest valued off diagonal elements closer to the diagonal and 
that moves the highly coupled processes closer to one another in the sequence. The graph in 
figure 4, demonstrates how the fitness function changes over the generations. Sometimes, there is 
a significant reduction, while other times there are no changes for several generations, followed 
by a slight reduction. This graph is from the Langley Organization test case. 
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Figure 4. Fitness Function History 


Applying the Genetic Clustering Algorithm 

This clustering GA is written in Microsoft Visual Basic® as an Excel macro. This 
enhances portability as well as facilitates interaction with existing DSM’s in Excel spreadsheet 
format. There are several aspects to consider when applying this GA. Every aspect is 
accomplished within an Excel Spreadsheet. 

Data input 

The user inputs the problem data in certain cells in column 1 of the Original DSM 
worksheet as shown in Figure 5. 



Figure 5. Worksheet Input Values 


The cell in row 1 column 1 is not to be used because it stores a random number for the random 
number generator. The labels for each of the parameters are in the row immediately above the 
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parameter. The user defines certain GA parameters such as the maximum number of generations, 
not to exceed 500 in row 5 (50), the population size, not to exceed 500 (a rule of thumb is to 
make the population size about 1 0 times the number of maximum processes but do not exceed the 
maximum number for population size) in row 8 (100), and the mutation parameter in row 14(1% 
= .01). The user defines the maximum number of processes, not to exceed 200 in row 11 (10), 
and fitness function values for the cluster parameter (the lowest number to be included in 
computing the fitness of a string) in row 1 7 (3) and the cluster penalty indicating how much an 
element is penalized by being away from the diagonal, in row 20 (10). The numbers in 
parentheses above reflect the numbers in Figure 5 for each parameter. The GA keeps track of the 
fitness value as it proceeds through each generation. These values are shown in column 1 
beginning with row 23. Before executing the genetic algorithm, the user is to store an existing 
DSM beginning with the process names in row 1, column 2, and the data beginning in row 1, 
column 3. 

Macro Execution 

When opening the ClusterGA Excel file, the user will be asked to Enable Macros. Click 
the button to enable them. To execute the cluster GA, the user selects Macro from the Tools 
menu, and then selects Macros from the Macro menu. Highlight the ClusterGA macro name and 
the click the Run button. The resulting DSM will appear on the Cluster DSM worksheet. Off 
diagonal cells used for clustering will be red on both the Original and Clustered DSM worksheets. 
In addition, to the right of the Clustered DSM, there will be a column indicating the position of 
that process in the Original DSM. 

Manual Manipulation of DSM Processes 

If the user wishes to move some processes around the DSM manually, the user can return 
to the tools menu and again select Macro and Macros. This time, highlight the MoveProcesses 
macro and click the Run button. The user will be asked which process to move and which 
process to move it after. The macro will then adjust the rows and columns to make the change. 
This macro continues until the user selects no more processes to be moved. All manual changes 
are stored on the Final DSM worksheet. No changes are made to the other two worksheets. 

Test Cases 

The clustering GA was tried on several test cases. The effect of clustering with the GA is 
demonstrated by showing before and after example cases. The yellow cells (or light gray) are on 
the diagonal. The red cells (or dark gray) indicate all off diagonal cells containing a value used in 
determining the fitness function. Each test case was executed with these parameters: population 
size =300, maximum generations = 400, mutation rate = .01, and cluster penalty =10. The 
number of processes and the cluster parameter varied with each test case. Each DSM is reordered 
by the GA indicate stronger coupling of the off diagonal elements under consideration around the 
diagonal. (Note: The spreadsheets in Figures 8, 12 and 14 are from Brady, ref. 6) 

Test Case 1- A DSM was constructed to evaluate the degree of communication between different 
branches from two different organizations at the NASA Langley Research Center. Figure 6 
shows the original DSM filled in with the original scores. There were 12 processes and the 
cluster parameter was 1.5. The clustering GA was applied to the DSM in Figure 6 and the results 
are shown in Figure 7. This clearly shows that the GA provided the desired behavior for 
clustering along the diagonal. 
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Figure 6. Langley Organization Test Case (before GA) 
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Figure 7. Langley Organization Test Case (after GA) 


Test Case 2-This example was taken from Brady (ref. 6). It demonstrates the grouping of bicycle 
components into subsystems. The original DSM is given in Figure 8 and represents the strength 
of the interfaces among the different components. There were 10 processes and the cluster 
parameter was 4. Figure 9 shows the results of applying the GA to the DSM. The Excel 
spreadsheet indicates the original numbers for each process in a separate column. 
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Figure 8. Bicycle test case (before GA) 


7 



Pedals 

Chain 

Gear Shift 

Gears 

Wheels 

Brake 

Brake Handle 
Handlebars 
Frame 
Odometer 


1 








2 



2 











3 





2 

2 





4 











5 




2 

2 






6 



2 








7 

2 

2 




2 




2 

8 

2 

2 

2 


2 


2 

2 

2 

2 

9 






2 



2 


10 


Figure 9. Bicycle test case (after GA) 


Test Case 3-This example was taken from the Next Generation Launch Vehicle (NGLV) study. 
The original DSM is given in Figure 10 and represents the strength of the interfaces for the 
different organizations composing the NGLV task. There were 46 processes and the cluster 
parameter was 6. Due to size restrictions, these numbers cannot be shown in the figure. Figure 
1 1 shows the results of applying the GA to the DSM. 
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Figure 10. NGLV Organization (before GA) 



Test Case 4-This example was taken from Brady (ref. 6). It demonstrates the dependency of the 
components for Mars Global Surveyor associated with propulsion, attitude control, 
telecommunications, and science instruments. The original DSM is given in Figure 12 and 
represents the interfaces among the different components based on a risk factor. There were 26 
processes and the cluster parameter was 4. Due to size restrictions, these numbers cannot be 
shown in the figure. Figure 13 shows the results of applying the GA to the DSM. 


Delta Launch Vehicle 
Prop Mod Structure 
Main Engine 

Attitude Control Thrusters 

Star Tracker 

IMU 

Sun Sensor 

Horizon Sensor 

Reaction Wheels 

Deep Space Network 

Ground Data Handling Control 

High Gain Antenna 

Low Gain Antenna 

Transponder 

Telecommunications Satellite 
Solid State Data Recorders 
Electronic Data Computer 
Payload Computer 
Mars Relay Antenna 
Mag & Electron Reflectometer 
Mars Orbiter Camera 
Mars Orbiter Laser Altimeter 
Thermal Emission Spectrometer 
Utltrastable Osicllator 
Solar Array Aerobrake Flaps 
NiH2 Batteries 



Figure 12. Mars Global Surveyor (before GA) 
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Figure 13. Mars Global Surveyor (after GA) 

Test Case 5-This example was taken from Brady (ref. 6). It demonstrates the components of the 
Mars Global Surveyor associated with propulsion, telecommunications, power and payload. The 
original DSM is given in Figure 14 and represents the represents the interfaces among the 
different components based on a risk factor. There were 20 processes and the cluster parameter 
was 4. Due to size restrictions, these numbers cannot be shown in the figure. Figure 15 shows the 
results of applying the GA to the DSM. 
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Figure 14. Mars Climate Orbiter (before GA) 
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Figure 15. Mars Climate Orb iter (after GA) 


Test Case 6-This example was taken from data generated by a Skills Assessment Team... The 
Skills Assessment Team filled out a DSM focusing on in-house skills needed to provide technical 
and management insight leading to a down select for the Crew Exploration Vehicle (CEV). The 
original DSM is given in Figure 16 and represents the strength of the interfaces among the 
different skills being assessed. There were 36 processes and the cluster parameter was 7. Due to 
size restrictions, these numbers cannot be shown in the figure. Figure 17 shows the results of 
applying the GA to the DSM. 


LEADERSHIP (140) 
APPLAERO (99) 
MMA (86) 

IASYS (85) 
COSTEST (121) 
ELESYS (13) 
ELMAG(12) 

TSENV (23) 

MISEXC (4) 

FLDSYS (106) 

VPPI (5) 

HUMFAC (40) 
MISEXC (4) 
LOCSUPTRAN (134) 
RMEA (28) 
MATENG (166) 
BOOMSC1 (74) 
ADVMATSCI (65) 
MECSYS (17) 
MISEXC (4) 

MMA (86) 

ROCPRO (73) 
FLTDSG, (2) 

QEA (23) 

RMEA (28) 
RISKMMT (123) 
AEROSEN (96) 
SWENG (82) 
STRSYS (63) 

RMEA (28) 

SAFENG (27) 
SYSENG (7) 
INTEGENG (9) 
SYSENG (7) 
THMSYS (104) 
EDTECH (122) 



Figure 16. Skills Assessment (before GA) 


11 


STRSYS (63) 
MECSYS (17) 
BOOMSCI (74) 
ELMAG(12) 

TSENV (23) 

ELESYS (13) 
AEROSEN (96) 
FLDSYS (106) 
THMSYS (104) 
APPLAERO (99) 
MMA (86) 

MMA (86) 

ROCPRO (73) 
FLTDSG, (2) 

MISEXC (4) 

MISEXC (4) 

MISEXC (4) 

EDTECH (122) 
IASYS (85) 

SWENG (82) 
HUMFAC (40) 
ADVMATSCI (65) 
MATENG (166) 

QEA (23) 

SYSENG (7) 
INTEGENG (9) 
SAFENG (27) 
SYSENG (7) 
LOGSUPTRAN (134) 
RMEA (28) 

VPPI (5) 

RMEA (28) 

RMEA (28) 

COSTEST (121) 
RISKMMT (123) 
LEADERSHIP (140) 



Figure 17. Skills Assessment (after GA) 

Test Case 7-This example was taken from skills data generated by the Investigation Definition 
Team (IDT)... The Investigation Definition Team filled out a DSM focusing on in-house skills 
needed to provide technical and management insight leading to a down select for the Crew 
Exploration vehicle (CEV). The original DSM is given in Figure 18 and represents the strength 
of the interfaces among the different skills being assessed. There were 64 processes and the 
cluster parameter was 3.9. Figure 19 shows the results of applying the GA to the DSM. 



Figure 18. IDT Formulation (before GA) 
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Figure 19. IDT Formulation (after GA) 


Summary 

Numerous DSM’s exist in Excel spreadsheet format. Some of them contain data that can 
be used in technology assessment. Design managers needed to cluster the tightly coupled 
processes along the diagonal to better visualize the impact they have on one another. Other than 
using the tedious method of moving the processes manually, no tool existed to aid in solving this 
clustering problem. 

A literature search found that GA’s had been applied to clustering problems and in optimizing the 
sequence of processes displayed in DSM format. It was also found that DSM’s had been applied 
to clustering problems. Flowever, no tools were found that coupled the GA and the DSM to solve 
the clustering problem. 

A GA was coded in Visual Basic as an Excel macro to interface with existing DSM’s in Excel 
spreadsheet format. The GA was tested with a variety of DSM’s. Some of the DSM’s were small 
(10 processes) and some were much larger (64 processes). Some of the DSM’s were sparsely 
populated with data while others were densely populated. In each case, the resulting optimized 
DSM had moved the tightly coupled processes to be clustered around the diagonal. 
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